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Abstract 

A  decision  making  procedure  is  an  essential  component  of  any  evidential  reasoning 
system  under  uncertainty.  That  is,  after  the  system  gathers  and  combines  all  of 
the  available  information,  a  decision  has  to  be  made  as  the  final  output  (e.g., 
diagnosis.)  In  probability  theory,  for  example,  a  final  decision  is  the  hypothesis 
with  the  highest  probability.  Unfortunately,  in  Dempster-Shafer  theory,  one  has 
more  difficulties  deciding  the  final  output  since  more  uncertainty  is  retained, 
even  in  the  combination  process  of  evidence.  As  the  matter  of  fact,  the  effective 
decision  making  procedure  remains  an  open  problem  for  Dempster-Shafer  theory. 
In  this  paper,  we  present  a  solution  to  this  problem.  The  solution  is  composed  of 
theoretical  analysis  and  the  construction  of  a  heuristic  decision  making  procedure. 
In  theoretical  analysis,  we  first  present  some  formal  relationships  among  ordered 
hypotheses  and  then  derive  the  sufficient  conditions  under  which  a  final  decision 
making  choice  is  sound.  When  the  sufficient  conditions  are  not  met,  heuristic 
combination  functions  are  proposed  to  make  the  final  choice.  Finally,  based  on 
the  theoretical  derivations  and  the  heuristic  combination  functions  proposed,  a 
decision  making  algorithm  is  given  for  the  theory. 


1      Introduction 

Reasoning  under  uncertainty  has  been  widely  and  extensively  investigated  in 
artificial  intelligence  and  other  related  fields.  Researchers  in  these  fields  have 
developed  many  methods  to  represent  uncertain  knowledge  and  draw  inferences 
from  them.  Among  these  methods  are  the  certainty  factor  model  in  MYCIN 
[Buchanan  and  Shortliffe,  1984],  Bayesian  probability  theory  (as  in  PROSPEC- 
TOR) [Duda  et  al.,  1976],  belief  networks  [Pearl,  1986b,  Pearl,  1986a],  Dempster- 
Shafer's  evidence  theory  [Shafer,  1976,  Gordon  and  Shortliffe,  1985],  and  Zadeh's 
possibility  theory  (fuzzy  logic)  [Zadeh,  1979].  One  of  the  key  issues  for  all  of  these 
methods  is  determining  a  decision  making  procedure;  that  is,  after  an  expert  sys- 
tem gathers  and  combines  all  of  the  available  information,  a  decision  has  to  be 
made  as  the  final  output  (e.g.,  diagnosis.)  In  probability  theory,  for  example,  a 
final  decision  is  the  hypothesis  with  the  highest  probability.  Unfortunately,  in 
Dempster-Shafer  theory,  one  has  more  difficulties  deciding  the  final  output  since 
more  uncertainty  is  retained,  even  in  the  combination  process  of  evidence.  As  the 
matter  of  fact,  the  effective  decision  making  procedure  remains  an  open  problem 
for  Dempster-Shafer  theory.  It  is  this  problem  for  Dempster-Shafer  theory  that 
constitutes  the  topic  of  this  paper. 

Dempster-Shafer  theory  is  one  of  the  major  paradigms  for  reasoning  under  un- 
certainty and  it  has  been  actively  and  extensively  explored  in  many  perspectives 
[Halpern  and  Fagin,  1990,  Fagin  and  Halpern,  1989,  Fagin  and  Halpern,  1990, 
Smets,  1990].  There  are  also  some  work  on  moving  the  theory  into  practical  uses 
[Shafer  and  Logan,  1987,  Saffiotti,  1990,  Deutsch-McLeish  et  al.,  1990].  There- 
fore, the  framework  to  be  proposed  for  the  decision  making  procedure  will  be  an 
important  and  useful  ingredient  of  the  use  and  development  of  the  theory. 

The  remainder  of  the  paper  is  organized  as  follows.  After  Dempster-Shafer 
theory  is  briefly  introduced  in  Section  2,  the  decision  making  problem  is  identi- 
fied and  discussed  in  Section  3.  In  Section  4,  some  formal  relationships  among 
hypotheses  are  presented  and  proved.   In  Section  5,  the  sufficient  conditions  for 


sound  decision  making  are  derived  based  on  the  formal  relationships  among  hy- 
potheses and  some  intuitive  observations  about  the  decision  making  processes. 
When  the  sufficient  conditions  are  not  satisfied  for  some  cases,  a  heuristic  combi- 
nation function  is  proposed  to  make  the  final  choice.  The  properties;  or  forms;  of 
these  functions  are  discussed.  From  the  theoretical  derivations  and  the  heuristic 
proposal,  an  effective  decision  making  algorithm  is  given,  along  with  some  com- 
ments. Finally,  in  Section  6,  some  concluding  remarks  are  made  and  possible 
future  studies  are  identified. 

2      Dempster-Shafer  Theory 

Let  0  be  a  set  of  mutually  exclusive  and  exhaustive  propositions  about  a  do- 
main, e.g.,  a  set  of  hypotheses  for  a  diagnostic  system.  0  is  called  the  frame 
of  discernment.  Let  20  denote  the  set  of  all  subsets  of  0.  Elements  of  2®,  i.e., 
subsets  of  0,  are  the  general  propositions  in  the  domain  with  which  the  theory  is 
concerned.  Three  basic  functions  are  to  be  defined:  the  basic  probability  assign- 
ment m,  the  belief  function  Bel,  and  the  plausibility  function  PL  They  all  define 
a  numerical  quantity  between  0  and  1  to  indicate  the  degree  of  support  provided 
to  a  proposition  by  the  piece  of  available  evidence;  their  domain  is  20  and  their 
range  is  [0, 1]. 

A  function  m  :  20  — >  [0,1]  is  a  basic  probability  assignment  if  it  satisfies 

m(0)  =  0     and       £  m(A)  =  L  (*) 

AC© 

The  quantity,  m(A),  represents  the  exact  belief  committed  in  the  proposition 
represented  by  A.  Therefore,  if 

m(G)  =  1  and  for  all  A  ^  G,  m(A)  =  0,  A  C  0  (2) 

then  a  piece  of  evidence  is  certain,  in  that  the  decision  making  output  must  be 
one  of  the  elements  in  set  G.  However,  it  does  not  necessarily  say  which  element 
of  G,  unless  G  is  a  singleton  set.   On  the  other  hand,  the  case  in  which  G  =  0 


tells  us  nothing,  i.e.,  it  represents  a  situation  of  total  ignorance  or  total  lack  of 
knowledge. 

The  belief  function  Bel  :  20  — »  [0, 1]  can  be  defined  in  terms  of  m 

Bel(A)  =   Y,  m(B)     for  ACQ  (3) 

BCA 

In  fact,  Shafer  has  shown  that  the  basic  probability  assignment  that  produces  a 
given  belief  function  is  unique  and  can  be  recovered  from  the  belief  function  by 
the  following  formulae. 

tn(A)  =   £  (-l)]A-BlBel(B)     for  ACQ  (4) 

BCA 

where  \X\  is  the  cardinality  of  the  set  X.  Therefore,  there  is  a  one-to-one  corre- 
spondence between  basic  probability  assignments  and  belief  functions. 
Finally  the  plausibility  function  PI  :  20  — ►  [0,  l]  is  denned  to  be 

Pl(A)  =  1  -  Bel(-^A)     for  ACQ  (5) 

It  can  be  shown  that  the  plausibility  function  PI  carries  exactly  the  same  infor- 
mation as  77i  and  Bel  do. 

Now  let  us  see  what  those  quantities  intuitively  mean  although  they  are 
all  equivalent  in  terms  of  information.  If  we  think  of  the  elements  as  points, 
m(A),A  C  0,  measures  the  total  probability  mass  constrained  to  stay  in  A  but 
not  confined  to  any  proper  subset  of  A.  It  represents  our  ignorance  of  not  being 
able  to  subdivide  our  belief  to  any  subsets  of  A.  Then  the  quantity,  Bel(A),  is 
the  measure  of  the  total  probability  mass  constrained  to  stay  somewhere  in  A. 
Since  Bel(-^A)  is  the  measure  of  the  probability  mass  constrained  to  stay  out  of 
A,  Pl(A)  =  1  —  Bel(->A),  is  the  measure  of  the  total  probability  mass  that  can 
move  into  A  (some  of  which  is  already  in  A  if  Bel(A)  >  0.)  To  put  it  other  way, 
Bel(A)  is  the  measure  of  the  lower  probability  of  A  and  Pl(A)  is  the  measure  of 
the  upper  probability  of  A.  Hence,  an  interval  notation,  [Bel(A),  Pl{A)\,  is  often 
used  to  represent  the  range  of  the  probability  of  A. 


Next  we  look  at  how  the  various  distinct  pieces  of  evidence  are  combined, 
i.e.,  how  the  fundamental  operation  of  uncertainty  reasoning  is  accomplished  in 
Dempster-Shafer  theory.  Let  mi  and  ra2  be  two  basic  probability  assignments 
representing  two  uncertain  pieces  of  evidence  for  the  same  frame  of  discernment, 
0.  Dempster's  combination  rule;  which  is  a  generalization  of  Bayes'  rule  com- 
putes a  new  basic  probability  assignment,  denoted  771!  ©  m2,  that  represents  the 
combined  effect  of  m^  and  m2  as  follows. 

m(0)  =  0 

m(A)  =   K   Y.   rnl{X)-m2{Y),ACO,A^®  (6) 

XnY=A 

K    =    J 

1  -  £;rny=0  rnl(X)m2 {Y) 

The  purpose  of  K  is  to  redistribute  the  belief  committed  to  0  by  the  intersection 
operation  to  other  non-empty  non-zero  belief  subsets;  namely;  the  normalization 
step;  in  order  to  make  m1©T7i2  an  eligible  basic  probability  assignment.  Therefore, 
if  1  —  Y^xnY=o  rni(^)Tn2(Y)  =  0?  then  rai  ©  m2  does  not  exist.  This  occurs 
when  the  combined  belief  functions  invest  no  belief  in  intersecting  sets,  and  mi 
and  m2  are  said  to  be  totally  or  flatly  contradictory.  The  quantity  log(K)  = 
Con(Beli,  Bel2)  is  called  the  weight  of  conflict  between  Bel\  and  Bel2. 

If  vn\  ©  m2  exists  for  given  m^  and  m2,  the  corresponding  belief  function, 
Beli  ©  Bel2,  and  the  plausibility  function,  Pli  ©  Pl2,  are  then  easily  computed 
from  mi  ©  m2  by  the  definitions. 

An  important  property  of  Dempster's  rule  is  that  it  is  commutative  and  as- 
sociative. This  is  desirable  because  evidence  aggregation  should  be  independent 
of  the  order  of  its  gathering. 

3  The  Decision  Making  Problem  for  the  Theory 

The  problem  of  an  effective  decision  making  procedure  for  Dempster-Shafer  the- 
ory has  been  recognized  by  several  researchers: 


"Dempster  and  Shafer's  theory  of  plausible  inference  provides  a 
methodology  for  the  representation  and  combination  of  evidence. 
However,  several  questions  need  to  be  addressed  before  it  can  be 
used  effectively  for  AI  applications.  Perhaps,  the  most  pressing 
problem  is  that  no  effective  decision  making  procedure  is  available." 
[Bhatnagar  and  Kanal,  1986](p.  24). 

"However,  more  work  needs  to  be  done  with  this  theory  before  it  is 
on  a  solid  foundation.  Several  problems  remain  as  obvious  topics  for 
future  research.  Perhaps  the  most  pressing  is  that  no  effective  decision 
making  procedure  is  available."  [Barnett,  198l](p.  874). 

The  concern  of  the  problem  is  that,  when  it  comes  to  make  a  decision,  which  of  the 
estimates  provided  by  m,  Bel,  PI  should  be  used.  If  we  have  hi  :  [0.5,  0.8]  and 
/i2  :  [0.4,  0.9],  which  one  of  hi  and  h2  should  be  chosen  as  the  final  output?  The 
difficulty  of  the  decision  stems  from  the  fact  that  although  Bel(hi)  is  greater  than 
£?e/(/i2),  Pl(hi)  is  less  than  Pl(h2).  It  is  indeed  a  very  important  problem  because 
an  expert  system  has  to  make  a  decision  at  the  end  of  information  gathering  and 
aggregation. 

Why  is  it  a  problem  for  Dempster-Shafer  theory?  The  difference  between  the 
Bayesian  approach  and  the  Dempster-Shafer  approach  is  that  the  former  sup- 
presses ignorance  up  front  when  prior  probabilities  are  determined,  while  the  lat- 
ter represents  ignorance  and  uncertainty  explicitly  and  retains  them  through  the 
combination  process.  This  may  suggest  one  advantageous  use  of  the  Dempster- 
Shafer  approach;  that  is,  we  initially  retain  ignorance.  As  the  evidence  narrows 
down  the  possibilities  we  may  or  may  not  have  a  decision.  If  not,  we  then  apply 
some  ad  hoc  selective  methods  to  make  our  final  choice.  In  this  paper,  we  at- 
tempt to  attack  the  problem  by  deriving  the  theoretical  conditions  under  which 
a  sound  decision  can  be  made  and  by  proposing  heuristic  combination  functions 
to  accommodate  other  cases  where  the  derived  conditions  are  not  satisfied. 


4  Some  Formal  Relationships  among  Hypotheses 

Before  we  can  devise  a  decision  making  algorithm  for  Dempster-Shafer  theory 
we  need  to  define  some  terms  and  prove  some  preliminary  formal  relationships 
among  hypotheses  which  are  ordered  by  their  current  belief  values. 

Definition  4.1  (Largest  Belief  Subset)  A  subset  A  is  said  to  be  the  largest 
belief  subset  if  the  following  conditions  are  satisfied  by  A. 

•  A  is  a  proper  subset  of  Q,  that  is,  A  C  0; 

•  Bel(A)  >  Bel(X)  for  all  X  C  &■ 

Clearly,  0  would  be  the  largest  belief  subset  if  A  were  not  denned  to  be  a 
proper  subset  of  0.  However,  0  is  not  an  interesting  candidate  for  decision  since 
its  selection  is  vacuous. 

Definition  4.2  (Largest  Belief  with  Minimum  Element  Subset)  A  sub- 
set A  of  0  is  said  to  be  the  largest  belief  with  minimum  element  subset  if 

•  A  is  the  largest  belief  subset; 

•  No  proper  subset  of  A  has  the  same  belief  value  as  A,   that  is,  Bel(A)  > 
Bel(X)    for  all  X  C  A. 

Definition  4.3  (Largest  Belief  with  Maximum  Element  Subset)  A  sub- 
set A  of  0  is  said  to  be  the  largest  belief  with  maximum  element  subset  if 

•  A  is  the  largest  belief  subset; 

•  If  there  exists  A'  such  that  Bel(A')  —  Bel(A),  then  \A\  >  \A'\;  where  \X\ 
is  the  cardinality  of  set  X. 

From  these  definitions,  the  existence  can  be  shown  of  certain  relationships 
among  these  subsets. 


4.1      Most  specific  hypotheses 

Theorem  4.1  If  A  is  the  largest  belief  with  minimum  element  subset  of  0 ,  B 
the  second  largest  belief  with  minimum  element  subset  of  Q,  and  A  ^  B ,  then 
exactly  one  of  the  following  relations  holds. 

1.  Be  A. 

2.  AUB  =  0. 

Proof:  We  will  prove  it  by  contradiction.  Assume  that  the  claim  were  not  true, 
then  we  would  have  B  (^  A  and  A  U  B  ^  0.  From  A  ^  B  and  B  (\  A,  we 
have  AD  B  C  B.  By  the  definition  of  Bel(X),  Bel(B)  >  Bel(A  n  B).  But 
Bel(B)  ^  Bel(A  D  B),  for,  if  they  were  equal,  then  An  B  would  be  the  second 
largest  belief  with  minimum  element  subset  by  the  definition,  contrary  to  the 
assumption  that  B  was.  Therefore, 

Bel{B)>  Bel(AHB).  (7) 

On  the  other  hand, 

Bel{AuB)  =  Bel{A)  + Bel{B)-Bel(AnB)+  £  m(X)       (8) 

XCAuB,X<$A,X(XB 

Since  Bel(B)  >  Bel(A  n  B)  and  m(X)  >  0,  for  allXCG, 

Bel(A\JB)  >  Bel(A).  (9) 

From  A  ^  B  and  B  $  A,  we  also  have  iU^Di  Together  with  AU5/0, 
AU B  should  be  the  largest  belief  with  minimum  element  subset;  a  contradiction. 

Now  what  is  left  to  be  shown  is  that  one  and  only  one  of  the  two  relations 
holds.  If  B  C  A,  then  A  U  B  =  A.  By  definition  of  A,  A  ^  0,  thus  AU  B  ^  Q. 
If  A  U  i?  =  0,  then  it  must  be  the  case  that  B  (X  A;  for,  if  not,  then  B  C  A  and 
AUl?  =  A^0by  definition  of  A,  contrary  to  the  assumption  that  AU  B  =  Q. 
□ 

For  the  convenience  later  in  the  paper,  the  theorem  is  rewritten  as  the  follow- 
ing corollary. 


Corollary  4.1  If  A  is  the  largest  belief  with  minimum  element  subset  ofQ,  B 
the  second  largest  belief  with  minimum  element  subset  of  0,  and  A  ^  B ,  then 
exactly  one  of  the  following  relations  holds. 

1.BCA. 

2.  Ar\B  =  0,  and  AU  B  =  0. 

3.  A  n  B  ^  0,  and  AU  B  =  0. 

Proof:  It  is  the  immediate  result  of  Theorem  4.1.  □ 

4.2  Most  general  hypotheses 

Theorem  4.2  //  A  is  the  largest  belief  with  maximum  element  subset  of  0,  B 
the  second  largest  belief  with  maximum  element  subset  of  0,  and  A  ^  B ,  then 
exactly  one  of  the  following  relations  holds. 

1.  BC  A. 

2.  AU  B  =  0. 

Proof:  By  similar  arguments,  we  would  have 

Bel(AuB)  >  Bel(A).  (10) 

If  B  (X  A,  then  AU  B  D  A.  But  if  A  U  B  ^  0,  then  A  U  B  would  be  the  largest 
belief  with  maximum  element  subset;  a  contradiction. 

The  claim  that  exactly  one  relation  holds  can  also  be  analogously  proved.   □ 

4.3  Generalization 

Theorem  4.3  Let  Ak  be  the  kth  largest  belief  with  minimum  (maximum)  element 
subset  o/0.  Then  L  =  [Aq,  A\,  A2, ...,  Ai]  is  a  sequence  of  hypotheses  in  descend- 
ing order  by  their  belief  values;  where  Aq  =  0.  Then,  for  any  k,  (1  <  k  <  I), 
exactly  one  of  the  following  relations  holds. 

8 


1.    Ak+i   C  Ay 


2.  Ak  H  Ak+\  =  0,  and  Ak  U  A^+i  =  Aj,  for  some  j,  0  <  j  <  k. 

3.  Ak  H  Afc+i  7^  0,  and  Ak  U  Afc+i  =  Aj,  /or  5077ie  j,  0  <  j  <  k. 

Proof:  The  proof  procedure  is  similar  to  the  above.  If  none  of  the  relations  holds, 
Ak  U  Ak+i,  being  different  from  Ak,  would  be  in  the  sequence  before  Ak.  □ 

{a,b,c} 

.4(1.0) 


{«.*} 


{a}^.3(.3)     {6}V.0(.0)     {c}^.2(.2) 


Figure  1:  An  example  to  demonstrate  the  relationships  among  hypotheses  or- 
dered by  their  belief  values;  where  0  =  {a,b,c},  the  numbers  before  the  paren- 
theses are  m(X),  and  the  numbers  in  the  parentheses  are  Bel(X). 


An  example  will  help  to  understand  the  essence  of  Theorem  4.3.  Suppose 
that  0  =  {a,b,c}  and  that  the  (combined)  basic  probability  assignment  is  as 
follows:  m({a})  =  .3,  m({c})  =  .2,  m({a,6})  =  .1,  m(0)  =  .4,  m(X)  = 
.0,  for  all  other  X  C  0.  Then,  it  is  straightforward  to  calculate  Bel(A)(A  C  0) 
from  m.  The  results  are  shown  in  Fig.  1.  Therefore,  [0,  {a,c},  {a,  6},  {a},  {c}} 
is  the  sequence  of  the  ordered  hypotheses;  according  to  the  kth  largest  belief  with 
minimum  element  subsets.  Now,  it  is  easy  to  check  that,  for  any  k  (1  <  k  <  4), 
either  Ak+i  C  Ak,  or  there  exists  j  such  that  0  <  j  <  k,  and  Ak+i  U  Ak  =  Aj. 


5      The  Decision  Making  Procedure 

In  this  section,  we  begin  with  some  observations  and  intuitions  behind  decision 
making  processes  at  the  end  of  evidence  gathering  and  aggregation.  Then,  we 
move  on  to  derive  the  sufficient  conditions  under  which  a  sound  decision  can  be 
made.  For  other  cases  where  the  conditions  are  not  met,  a  heuristic  combination 
function  is  proposed  to  make  the  final  choice.  The  coefficients  in  the  heuristic 
function  can  be  learned  to  best  fit  the  domain  under  investigation.  Based  on 
the  theoretical  derivations  and  the  heuristic  combination  function  proposed,  an 
algorithm  for  decision  making  is  given.  Finally,  some  comments  concerning  the 
algorithm  and  empirical  experiments  are  made. 

5.1      Intuitions  and  observations 

Suppose  that,  after  gathering  and  aggregating  available  evidence,  an  expert  sys- 
tem has  the  following  quantities:  m(h),  Bel(h),  Pl(h),  for  all  h  C  0.  If  h0  is 
chosen  as  the  final  output,  what  can  be  said  about  the  property  or  character- 
istics of  ho  in  terms  of  m,  Bel,  Pit  Since  m,  Bel,  PI  are  numerical  values,  it 
amounts  to  asking  what  relations  should  hold  between  the  chosen  hypothesis  ho 
and  all  other  hypotheses  expressed  in  m,  Bel,  PL  The  following  are  some  intu- 
itive observations.  (Note:  since  the  interval  notation  is  often,  and  easily  used, 
the  following  relations  do  not  contain  m.) 

•  If  Bel(hi)  >  Bel(h2),  then  hi  is  more  probable  than  h2  as  the  final  output. 

•  If  Pl(hi)  >  Pl(h2),  then  hi  has  more  potential  to  become  more  probable 
than  h2  as  the  final  output. 

•  Thus,  if  both  Bel(hi)  >  Bel(h2)  and  Pl(hi)  >  Pl{h2),  then  hi  should  be 
preferred  to  h2  as  the  final  output  given  the  current  available  evidence. 

Unfortunately,   though,   it  is   not   always   possible   to  have  both   Bel(hi)    > 
Bel(h2)  and  Pl(hi)  >  Pl(h2)  satisfied  for  some  hi  and  h2.    Thus,  we  will  next 
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derive  the  conditions  that  guarantee  the  satisfaction  of  both  inequality  relations. 
As  it  is  known,  every  interval  converges  to  a  point,  i.e.,  Bel(h)  =  Pl(h)  if  given 
complete  information.  In  this  case,  decision  making  reverts  to  probability  theory. 
Another  intuition  is  that  a  more  specific  hypothesis  A  is  always  preferred  to 
a  more  general  hypothesis  A'  if  they  have  the  same  probability  interval,  i.e.,  if 
Bel(A)  =  Bel(A'),  Pl(A)  =  PZ(A'),  and  A  C  A'.  The  reason  is  simple;  the  more 
specific  the  hypothesis,  the  more  informative  it  is. 

5.2      Sufficient  conditions  for  sound  decision  making 

Given  the  formal  relationships  among  ordered  hypotheses  derived  in  the  last  sec- 
tion, we  can  show  that  certain  numerical  relations  exist  among  them  as  well, 
under  certain  conditions.  These  conditions  are  then  considered  to  be  the  suffi- 
cient conditions  to  make  a  sound  decision,  according  to  the  above  intuitions  and 
observations. 

Theorem  5.1  If  A  is  the  largest  belief  with  minimum  (maximum)  element  subset 
of  Q ,  B  the  second  largest  belief  with  minimum  (maximum)  element  subset  ofQ, 
and  A  ^  B ,  then 

Pl(A)  >  Pl(B)  (11) 

holds  for  the  following  two  cases: 

1.  B  C  A. 

2.  AD  B  -  0,  AU  B  =  0. 

Proof:  Each  case  is  dealt  with  separately. 

Case  1:  B  C  A. 
From  B  C  A,  we  have  -ul  C  ->B.  Thus,  Bel(-»A)  <  Bel(-^B)  from  the  definition 
of  Bel(X).  By  the  definition  of  Pl(X), 

Pl(A)  =  l-Bel(-iA)  (12) 

Pl(B)  =  l-Bel(^B)  (13) 
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thus,  Pl(A)  >  Pl(B)  foUows  from  Bel(-^A)  <  Bel(-^B). 

Case  2:  A  n  B  =  0,  A  U  £  =  0. 
Then,  -.A  =  B  and  -.£  =  A.  Thus, 

P/(A)  =  1  -  Bel(^A)  =  1  -  Bel(B)  (14) 

PJ(J3)  =  1  -  Bel{pB)  =  1  -  Bel(A)  (15) 

Therefore,  Bc/(A)  >  Bel(B)  impHes  P/(A)  >  Pl(B).  D 

As  it  should  be  noticed,  this  theorem  essentially  establishes  the  sufficient 
conditions  to  make  both  Bel(hi)  >  Bel(h2)  and  Pl(hi)  >  Pl(h2)  satisfied  simul- 
taneously for  some  hi  and  h2. 

Given  that  A  is  the  largest  belief  with  minimum  element  subset  of  0,  B  the 
second  largest  belief  with  minimum  element  subset  of  0,  and  A  ^  B,  Corollary  4.1 
(thus,  Theorem  4.1)  says  that  one  and  only  one  of  the  three  relations  between  A 
and  B  must  hold.  Theorem  5.1  dictates,  however,  that,  for  two  out  of  the  three 
cases  that  one  relation  holds,  both  Bel(A)  >  Bel(B)  and  Pl(A)  >  Pl{B)  are 
satisfied  simultaneously. 

Unfortunately,  a  generalization  of  Theorem  5.1,  unlike  its  counterpart  Theo- 
rem 4.3,  is  weaker,  as  shown  in  the  following  theorem. 

Theorem  5.2  Let  Ak  be  the  kth  largest  belief  with  minimum  (maximum)  element 
subset  o/0.  Then  L  =  [A0,  Ai,A2,  -.-iAi]  is  a  sequence  of  hypotheses  in  descend- 
ing order  by  their  belief  values;  where  Aq  =  0.    Then,  for  any  k,   (1  <  k  <  I), 

Pl(Ak)  >  Pl(Ak+1)  (16) 

holds  if 

Ak+1  C  Ak.  (17) 

□ 

Although  Theorem  5.2  is  weaker  than  its  counterpart  Theorem  4.3,  it  should 
be  noticed  that  it  only  gives  the  sufficient  conditions  to  make  both  Bel(hi)  > 
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Bel(h2)  and  Pl(hi)  >  Pl(h2)  satisfied  simultaneously.  In  other  words,  for  case 
2  and  3  in  Theorem  4.3,  it  is  still  possible  that  both  Bel(hi)  >  Bel(h2)  and 
Pl(hi)  >  Pl(h2)  are  satisfied  at  the  same  time.  However,  we  cannot  count  on 
the  chances.  Fortunately,  the  decision  making  algorithm  designed  later  in  the 
paper  will  correctly  and  automatically  take  care  of  all  the  cases  in  Theorem  4.3 
if  the  heuristic  combination  function  has  the  property  that,  if  both  Bel(hi)  > 
Bel(h2)  and  Pl{hi)  >  Pl(h2),  then  hi  is  preferred  to  h2  by  the  heuristic  function 
evaluation.  This  is  a  natural  requirement  for  a  heuristic  combination  function  to 
be  rational  (see  discussion  about  heuristic  combination  functions.) 

5.3      Heuristic  combination  functions  for  decision  making 

When  the  derived  sufficient  conditions  among  hypotheses  are  not  met,  it  is  not 
guaranteed  that  both  Bel(hi)  >  Bel(h2)  and  Pl(hi)  >  Pl(h2)  are  satisfied  at  the 
same  time;  and,  thus,  the  final  choice  must  be  selected  based  on  some  heuristic 
combination  function  of  Bel(h)  and  Pl{h). 

There  are  several  concerns  about  the  properties  or  forms  of  heuristic  combi- 
nation functions. 

1.  Rationality.  If  both  Bel(hi)  >  Bel(h2)  and  Pl(hx)  >  Pl{h2),  then  hi  must 
be  preferred  to  h2  by  the  heuristic  evaluation;  that  is,  if  /  is  a  heuristic  function, 
then  it  must  be  the  case  that  f(hi)  >  f{h2). 

2.  Hard  evidence.  Because  that  Bel(h)  represents  the  supportive  evidence 
whereas  Pl(h)  only  represents  the  potentials  (it  also  represents  doubt  in  h  since 
Pl[h)  =  1  —  Bel(->h)),  the  heuristic  combination  functions  should  give  more 
weights  to  Bel(h)  than  to  Pl(h),  unless  otherwise  required  by  the  domain  and 
specified  by  the  user. 

3.  Distributions.  If  the  distribution  of  convergent  points  (where  Bel(h)  = 
Pl(h))  is  known,  the  heuristic  combination  functions  should  take  it  into  consid- 
eration. But  these  distributions  are  hardly  available. 

4.  Simplicity.  A  simple  function  f(h),  e.g.,  a  linear  combination  of  Bel(h) 
and  Pl(h),  is  preferred  to  a  complex  one,  unless  other  information  about   the 
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domain,  e.g.,  distributions  of  convergent  points,  indicate  otherwise. 

Among  these  concerns,  only  the  rationality  requirement  is  strongly  suggested 
and  others  are  dependent  upon  the  circumstances  in  the  question. 

The  following  is  an  example  of  a  generic  heuristic  combination  function  which 
meets  the  above  mentioned  concerns. 

f(h)  =  cBel(h)  +  (1  -  c)Pl(h),  where  0.5  <  c  <  1.  (18) 

Clearly,  if  both  Bel(hx)  >  Bel(h2)  and  Pl^)  >  Pl{h2),  f(hx)  >  f(h2).  Thus, 
the  rationality  requirement  is  met.  Since  c  >  0.5,  Bel(h)  gets  more  weight  than 
Pl(h).  No  distribution  is  known.  And  finally,  it  is  a  simple  linear  function.  The 
coefficient  c  in  this  simple  heuristic  combination  function  may  vary  drastically 
for  different  domains;  e.g.,  in  medical  diagnosis,  one  may  want  to  assign  a  high 
value  for  c  since  the  result  may  be  a  life-and-death  issue. 

5.4  Learning  the  coefficients  of  heuristics 

To  improve  diagnostic  accuracy,  the  coefficients  in  the  heuristic  combination  func- 
tion can  be  adjusted  through  learning  to  best  fit  the  domain  under  investigation. 
To  do  so,  a  set  of  solved  cases  is  used  to  train  the  heuristic  combination  function 
so  that  the  optimal  values  for  the  coefficients  in  the  function  are  found.  Con- 
straints on  the  coefficients  can  be  imposed  before  the  training.  For  example,  for 
the  linear  heuristic  combination  function  above,  a  set  of  solved  cases  may  help  to 
find  that  the  best  value  of  c  is  0.65. 

5.5  The  decision  making  algorithm 

The  combination  of  the  theoretical  derivations  and  the  generic  heuristic  proposed 
results  in  the  following  effective  decision  making  algorithm  for  Dempster- Shafer 
theory.  Note  that  the  algorithm,  by  default,  works  on  the  most  specific  hypothe- 
ses, unless  requested  otherwise. 
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Algorithm:. 

1.  Sort  out  all  the  hypotheses  according  to  their  current  belief  values:  Bel(h). 

2.  If  most  general  hypotheses  preferred,  then  arrange  the  hypotheses  according 
to  their  generality;  and  denote  the  final  sorted  list  by  L  =  [Ai,j4.2,  ...,  Ai]. 

3.  Otherwise,  (i.e.,  most  specific  hypothesis  preferred),  then  eliminate  all  A' 
if  there  exists  A  such  that  Bel(A)  =  Bel(A')  and  A  C  A';  and  denote  the 
final  sorted  list  by  L  —  [j4i,i4.2, ...,  A{\. 

4.  If  /  =  1,  then  output  Ai  as  the  final  choice;  and  stop.1 

5.  Test  the  sufficient  conditions;  if  any  of  the  sufficient  conditions  is  satisfied, 
then  let  T  =  Aj_i.  If  /  =  2,  then  output  A\  (a  sound  decision)  and  stop. 

6.  Otherwise,  let  T  =  Heuristics.evaluation(f,Ai_i,Ai). 

7.  Remove  Ai_i  and  Ai  from  Z,  and  append  T  to  L  at  the  end. 

8.  1  =  1-1. 

9.  Goto  step  4. 

Procedure  Heuristics.evaluation(f,  hi,  h2) 

/*  /  is  a  user-supplied  heuristic  combination  function  */ 
If  /(/ii)  >  f(h2)  then 
return  hi 
else  return  h2 

Now  let  us  see  how  the  algorithm  correctly  and  automatically  takes  care  of 
all  the  cases  in  Theorem  4.3.  If  the  heuristic  combination  function  in  proce- 
dure Heuristics -evaluationQ  has  the  rationality  property,  then,  whenever  both 


xThe  algorithm  can  be  easily  modified  to  output  several  equally  plausible  hypotheses. 
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Bel(hi)  >  Bel(h2)  and  Pl(hi)  >  Pl(h2),  hi  will  prevail  even  if  the  sufficient  con- 
ditions are  not  satisfied.  It  means  that  the  rationality  property  of  these  functions 
captures  all  the  cases  which  the  sufficient  conditions  miss. 

Let  us  apply  the  algorithm  to  the  example  in  Fig.  1.    First,  assume  that  we 
use  the  linear  combination  function  denned  in  (18)  with  c  =  0.6.  That  is; 


f(h)  =  0.6  *  Bel(h)  +  0.4  *  Pl(h) 


(19) 


Then  the  computation  of  various  quantities  is  carried  out  and  the  results  are 
shown  in  Table  1. 


h 

m(h) 

Bel(h) 

Pl(h) 

f(h) 

{a,c} 

0.0 

0.5 

1.0 

0.70 

{a,  b} 

0.1 

0.4 

0.8 

0.56 

{a} 

0.3 

0.3 

0.8 

0.50 

{c} 

0.2 

0.2 

0.6 

0.36 

Table  1:    An  example  to  illustrate  the  decision  making  algorithm 

Going  through  the  algorithm  with  these  numbers  in  Table  1  results  in  {a,  c} 
being  the  final  output.  Note  that  {a,c}  and  {a,  6}  do  not  satisfy  the  sufficient 
conditions;  but,  since  Bel({a,c})  >  Bel({a,b})  and  Pl({a,c})  >  P/({a,6}), 
{a,c}  is  the  preferred  choice  by  the  heuristic  evaluation.  This  is  an  instance 
where  the  rationality  property  of  the  heuristic  combination  function  captures 
what  the  sufficient  conditions  missed. 


5.6      Comments  and  Experiments 

To  briefly  summarize,  the  contributions  of  the  theoretical  derivations  of  various 
results  and  the  proposed  framework  to  set  up  heuristic  combination  functions  are 
several-fold.  First,  the  sufficient  conditions  precisely  identify  the  cases  when  a 
decision  making  is  sound.  If  such  a  case  arises  in  practice,  one  can  have  complete 
confidence  believing  the  final  output  (Step  5).  Without  such  theoretical  results, 
one  can  never  be  sure.  Heuristics  alone  can  never  guarantee  anything.  Second, 
the  sufficient  conditions  directly  relate  to  and  indirectly  lead  to  the  rationality 
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requirement  for  heuristic  combination  functions.  Third,  some  formal  relation- 
ships among  hypotheses  are  made  clear  and  explicit  by  various  theorems.  Those 
relationships  among  hypotheses  are  certainly  the  basis  for  further  investigation 
of  this  problem  and  some  other  problems  of  the  theory.  Forth,  the  heuristic  pro- 
posed provides  a  flexible  framework  for  setting  up  heuristic  combination  functions 
to  meet  different  needs.  Fifth,  the  decision  making  algorithm  devised  is  easy  to 
implement  in  practical  systems.  Some  projects  are  being  planned  to  make  use  of 
this  decision  making  algorithm  for  various  domains. 

6      Concluding  Remarks 

In  this  paper,  we  addressed  an  important  problem  for  reasoning  under  uncertainty 
in  Dempster-Shafer  framework  -  the  decision  making  procedure.  It  is  important 
because  an  expert  system  has  to  make  a  final  decision  at  the  end  of  informa- 
tion gathering  and  combination.  After  Dempster-Shafer  theory  was  introduced, 
the  open  problem  of  effective  decision  making  procedure  for  the  theory  was  then 
identified  and  discussed.  To  propose  solutions  to  the  problem,  some  formal  rela- 
tionships among  ordered  hypotheses  were  presented  and  proved.  Based  on  these 
formal  relationships  among  hypotheses,  the  sufficient  conditions  for  sound  de- 
cision making  were  derived.  In  case  of  the  sufficient  conditions  not  being  met, 
heuristic  combination  functions  were  proposed  to  make  the  final  choice.  The 
properties  or  forms  of  these  functions  were  emphasized.  From  the  theoretical 
results  and  the  heuristic  proposed,  an  effective  decision  making  algorithm  for 
Dempster-Shafer  theory  was  devised. 

For  future  studies,  we  are  to  apply  this  decision  making  algorithm  to  various 
domains  and  to  propose  specific  heuristic  combination  functions  for  them.  Doing 
so  may  allow  us  to  further  characterize  the  heuristic  combination  functions  in  a 
greater  detail. 
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